# R Options
options(stringsAsFactors=FALSE)

# Required libraries
library(dplyr)
library(tidyr)
library(ggplot2)
library(patchwork)
library(openxlsx)
library(ggpubr)
library(ggsci)
library(knitr)
library(kableExtra)
library(Seurat)

# Source plotting functions
source("functions_plotting.R")
source("functions_analysis.R")

Dataset description

Read input data

In this first section of the report, we read 10X RNA and HTO input data from the files produced by CellRanger:
* barcodes.tsv.gz: All cell barcodes
* features.tsv.gz: (Ensembl) ID, name, and type for each gene and HTO
* matrix.mtx.gz: Raw RNA and HTO counts
and setup a Seurat object.

## An object of class Seurat 
## 17607 features across 16916 samples within 2 assays 
## Active assay: RNA (17599 features)
##  1 other assay present: HTO
## An object of class Seurat 
## 17607 features across 3000 samples within 2 assays 
## Active assay: RNA (17599 features)
##  1 other assay present: HTO

Demutliplexing Hashtag Oligos (HTO)

This section of the report shows how cells are assigned to their sample-of-origin.

Normalisation of HTO counts

We start the analysis by normalising raw HTO counts. HTO counts for each cell are divided by the total counts for that cell and multiplied by 10,000. This is then natural-log transformed.

Classification of cells based on normalised HTO data

We assign cells to sample-of-origin, annotate negative cells that cannot be assigned to any sample, and doublet cells that are assigned to two samples.

## Cutoff for htoA : 222 reads
## Cutoff for htoB : 53 reads
## Cutoff for htoC : 99 reads
## Cutoff for htoD : 286 reads
## Cutoff for htoE : 231 reads
## Cutoff for htoF : 221 reads
## Cutoff for htoG : 437 reads
## Cutoff for htoH : 127 reads

Visualisation of raw and normalised HTO data

This section of the report visualises raw and normalised HTO data to understand whether the demultiplexing step has worked well.

Pairs of raw (top) and normalised (bottom) HTO counts are visualised to confirm mutal exclusivity in singlet cells. Data points correspond to measured HTO counts per HTO, colours correspond to the assigned samples-of-origin.

The following ridge plots visualise the enrichment of assigned sample-of-origin for the respective normalised HTO counts.

Lastly, we compare the number of genes between classified cells.

Remove cells classified as doublet or negative

This section of the report states the number of cells that remain after negative and doublet cells are removed.

## An object of class Seurat 
## 17607 features across 2422 samples within 2 assays 
## Active assay: RNA (17599 features)
##  1 other assay present: HTO

Preliminary pre-processing of RNA data

This section of the report provides first insights into your RNA dataset based on a preliminary pre-processing of the RNA data using the usual scRNA-seq workflow.

Visualisation of demultiplexed RNA data

We use a UMAP to visualise and explore a dataset. The goal is to place similar cells together in 2D space, and learn about the biology underlying the data. Cells are color-coded according to the assigned sample-of-origin.

Take care not to mis-read a UMAP:

  • Parameters influence the plot (we use defaults here)
  • Cluster sizes relative to each other mean nothing, since the method has a local notion of distance
  • Distances between clusters might not mean anything
  • You may need more than one plot

For a nice read to intuitively understand UMAP, see https://pair-code.github.io/understanding-umap/.

Write out demultiplexed data

Finally, demultiplexed RNA data are written back to file.